Getting Started: R and BigQuery

This slide deck was built in Quarto!

  • Use keyboard arrow keys to
    • advance ( → ) and
    • go back ( ← )
  • Type “s” to see speaker notes
  • Type “?” to see other keyboard shortcuts

Progress So Far

So far, you’ve:

  • Signed up for GCP
  • Created a project
  • Added BigQuery to that project
  • Looked at public data in another project
  • Used that data to create a query and a table in your project

What’s Left?

  • Exporting your table for use elsewhere
  • Connecting directly to BigQuery from RStudio
  • (Maybe) Vertex AI Workbench

Let’s Export!

Getting Query Results to Save

Since you’ve saved the query you used to create this table, you can re-run that query.

Or, since you’ve created a table that contains all the data you want, you run a “give me everything” query:

SELECT * FROM [table_name]

Saving Data Exercise

Run a query – either your saved query or a “SELECT * FROM…” on your saved table.

Then Click the Save Results button and save the .csv to your computer.

03:00

R vs RStudio

Quick reminder: R is a language, that can be run in many settings:

  • from the command line
  • in an automation (like a cron job)
  • from the R IDE (very bare bones)
  • from the RStudio IDE
  • within a notebook like a Jupyter notebook

RStudio is a fully featured IDE that runs on Linux, Windows, and Mac. It’s much more heavy weight than just the language.

Working with RStudio IDE

Many of us love RStudio. You can spin up an RStudio server in GCP… but you don’t have to, and it might mean spending money when you don’t need to.

You can use your normal RStudio (on your computer or in Posit.cloud) to work with BigQuery.

Following Along

Visit https://github.com/pm0kjp/rmedicine_2024_bigquery/blob/main/bq_demo.qmd and get the code, but we’ll also do this together bit by bit.

Using non-GCP RStudio

Please open whatever RStudio you typically use. https://posit.cloud is a good option!

When you get there, open a new Quarto or R Markdown document.

03:00

bigrquery

There are several ways to connect to BigQuery (see, e.g. https://solutions.posit.co/connections/db/databases/big-query/) but the way I’m going to show you uses a library called bigrquery.

Exercise

Please install bigrquery as well as tidyverse, if you don’t already have that installed.

Then you’re going to start a new Quarto or R Markdown document and add a code chunk that loads these two libraries:

Authenticating

Now you need to authenticate so that RStudio can connect to BigQuery. You’ll next type:

bq_auth()

Use the same Google Identity you’re using for GCP. If it suggests installing httpuv, do it, as otherwise authorization might fail.

Step 1: Store (or not) Credential

Step 2: Sign in With Correct Identity

Step 3: Give permissions

Step 4: Get Authorization Code

You might or might not see this – you might get this code passed back automatically with a message that says

Authentication complete. Please close this page and return to R.

Step 5: Enter Auth Code

Run a Query from RStudio

The pattern is:

  • Set your project_id and my_sql_query objects
  • Send that query to BigQuery using results <- bq_project_query(project_id, my_sql_query)
  • Get the table via df <- bq_table_download(results)

Take a look!

Visit https://github.com/pm0kjp/rmedicine_2024_bigquery/blob/main/bq_demo.qmd and get the code, but we’ll also do this together bit by bit.

Vertex AI Workbench

::: note In the burger menu, select Vertex AI, then Workbench. Enable the api.

:::

Create a Workbench Instance

Important to shut down: https://cloud.google.com/vertex-ai/docs/workbench/instances/shut-down

Setting up R kernel

In a new Terminal in your Jupyter instance, enter the following:

conda create -n r
conda config --add channels conda-forge
conda install -c conda-forge r-base
conda install -c conda-forge r-essentials
conda install -c conda-forge r-tidyverse
conda install -c conda-forge r-stringr
conda install -c conda-forge r-gargle
conda install -c conda-forge r-bigrquery
conda activate r

Answer “y” when prompted.

Start Notebook with R kernel

File > New > Notebook

Select “R” as the kernel